Deriving Word Association Networks from Text Corpora

نویسندگان

  • David Galea
  • Peter Bruza
چکیده

This article presents and evaluates a model to automatically derive word association networks from text corpora. Two aspects were evaluated: To what degree can corpus-based word association networks (CANs) approximate human word association networks with respect to (1) their ability to quantitatively predict word associations and (2) their structural network characteristics. Word association networks are the basis of the human mental lexicon. However, extracting such networks from human subjects is laborious, time consuming and thus necessarily limited in relation to the breadth of human vocabulary. Automatic derivation of word associations from text corpora would address these limitations. In both evaluations corpusbased processing provided vector representations for words. These representations were then employed to derive CANs using two measures: (1) the well known cosine metric, which is a symmetric measure, and (2) a new asymmetric measure computed from orthogonal vector projections. For both evaluations, the full set of 4068 free association networks (FANs) from the University of South Florida word association norms were used as baseline human data. Two corpus based models were benchmarked for comparison: a latent topic model and latent semantic analysis (LSA). We observed that CANs constructed using the asymmetric measure were slightly less effective than the topic model in quantitatively predicting free associates, and slightly better than LSA. The structural networks analysis revealed that CANs do approximate the FANs to an encouraging degree.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Constructing Word-Sense Association Networks from Bilingual Dictionary and Comparable Corpora

A novel thesaurus named a “word-sense association network” is proposed for the first time. It consists of nodes representing word senses, each of which is defined as a set consisting of a word and its translation equivalents, and edges connecting topically associated word senses. This word-sense association network is produced from a bilingual dictionary and comparable corpora by means of a new...

متن کامل

An Improved Method for Deriving Word Meaning from Lexical Co-Occurrence

The lexical semantic system is an important component of human language and cognitive processing. One approach to modeling semantic knowledge makes use of hand-constructed networks or trees of interconnected word senses (Miller, Beckwith, Fellbaum, Gross, & Miller, 1990; Jarmasz & Szpakowicz, 2003). An alternative approach seeks to model word meanings as high-dimensional vectors, which are deri...

متن کامل

Word Association Thesaurus as a Resource for extending Semantic Networks

The paper reports the on-going research for applying psycholinguistic resources to building and extending semantic networks. We survey different kinds of information that can be extracted from a Word Association Thesaurus (WAT), a resource representing the results of a large-scaled free association test. In addition, we give a comparison of WAT and other language resources (e.g. text corpora, e...

متن کامل

An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence

The lexical semantic system is an important component of human language and cognitive processing. One approach to modeling semantic knowledge makes use of hand-constructed networks or trees of interconnected word senses (Miller, Beckwith, Fellbaum, Gross, & Miller, 1990; Jarmasz & Szpakowicz, 2003). An alternative approach seeks to model word meanings as high-dimensional vectors, which are deri...

متن کامل

Automatic Acquisition of a High-Precision Translation Lexicon from Parallel Chinese-English Corpora

This paper presents a hybrid approach to deriving a translation lexicon from unaligned parallel Chinese-English corpora. Two types of information, namely, proximity and document-external distributions of word pairs, are proposed to enhance the precision of the translation lexicon derived from statistical and dictionary-based methods. The former can identify translations of Chinese compounds, wh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015